Risk assessment of interstate pipelines using a fuzzy-clustering approach

Interstate pipelines are the most efficient and feasible mean of transport for crude oil and gas within boarders. Assessing the risks of these pipelines is challenging despite the evolution of computational fuzzy inference systems (FIS). The computational intricacy increases with the dimensions of the system variables especially in the typical Takagi–Sugeno (T–S) fuzzy-model. Typically, the number of rules rises exponentially as the number of system variables increases and hence, it is unfeasible to specify the rules entirely for pipeline risk assessments. This work proposes the significance of indexing pipeline risk assessment approach that is integrated with subtractive clustering fuzzy logic to address the uncertainty of the real-world circumstances. Hypothetical data is used to setup the subtractive clustering fuzzy-model using the fundamental rules and scores of the pipeline risk assessment indexing method. An interstate crude-oil pipeline in Egypt is used as a case study to demonstrate the proposed approach.


Traditional indexing method
A subjective evaluation tool for assessing pipeline risks based on a combination of statistical failure data and operator experience, in which the pipeline is divided into segments based on factors such as population, land type, soil condition, coating condition, pipeline age, or any other factors determined by the evaluator.
This approach makes multiple hypotheses, including that all risks are independent and additive, that the worst-case scenario for the pipeline section is assigned, that all point values are relative rather than absolute, that the relative importance of each item is based on expert evaluations, that only risks to the public are considered, and that no consideration is given to pipeline operators or contractors.
Data is obtained to create an index for each type of pipeline failure initiation, including (a) third-party damage, (b) corrosion, (c) design, and (d) incorrect operations, Fig. 1 shows the basic risk assessment model. These four indices rank the likelihood and significance of all elements that maximize or minimize the likelihood of a pipeline failure. The indices are then added together to get the Index Sum, as stated in Eq. (1). As the index sum score increases, so does the probability of risk, and vice versa. The evaluation concludes with a discussion of the effects of a pipeline system breakdown. The leak impact factor is a consequence factor that is used to change the index total scores to reflect the repercussions of failure, with a greater point representing a bigger risk. The leak impact factor is the sum of the product risks (acute + chronic), leak volume, receptors, and dispersion factor, as stated in Eq. (2), where the dispersion factor equals the leak volume spill score (LV) divided by the receptors population score (RE), as indicated in Eq. (3). As demonstrated in Eq. (4), the relative risk score RRS is equal to the Index Sum (IS) divided by the Leak Impact Factor (LIF) 45 .

Fuzzy inference system
The fundamental principle of fuzzy set theory was introduced by Zadeh (1965) 4 to resolve uncertainty in real-life circumstances. Fuzzy logic is used to solve issues with unsharp boundaries where membership is determined by degree. A fuzzy set defined on a universe of discourse (U) is a characterized by a membership function µ(x)(x) , that accepts values from the interval [0, 1], where 0 indicates non-membership and 1 indicates full membership. A membership function quantifies the degree to which an element in U is similar to the fuzzy subset. For certain linguistic variables, fuzzy sets are defined. Each linguistic term can be expressed by a membership function of triangular, trapezoidal, or Gaussian form. The selection of membership function is mostly determined by variable features, accessible information, and expert opinion 26 . In this work Gaussian membership functions are employed for being the most natural 22 , smooth and nonzero at all points 46 . As a result, it can tackle real challenges with uncertain and vague data as in risk assessment studies. Gaussian membership function can be represented as illustrated in Eq. (5).
where c i and σ i σ i are the center and width of the ith fuzzy set A i , respectively, as shown in Fig. 2. A fuzzy inference system maps an input space (universe of discourse) to an output space using fuzzy logic. A list of IF-THEN rules, membership functions that describe how each point in the input space is translated to a degree of membership between 0 and 1, and fuzzy logic operators that link with the fuzzy sets are the primary mechanisms for achieving this. As shown in the Fig. 3, a fuzzy inference system consists of: (1) knowledge base, (2) inference or decision-making unit, (3) fuzzification interface, and (4) defuzzification interface.
Several fuzzy inference models are applied in numerous applications, such as Mamdani, Takagi-Sugeno, and Tsukamoto fuzzy model. The Takagi-Sugeno and Mamdani approaches are commonly used to model real-world situations. In many ways, the two techniques are very similar to one another. The first two steps of the fuzzy inference process, fuzzification of inputs and application of fuzzy operators, are identical. The primary distinction is that Takagi-Sugeno output membership functions are either linear or constant. The Takagi-Sugeno approach is applied in this study to assess potential pipeline risks. Figure 1. The basic risk assessment model. www.nature.com/scientificreports/ The TS model introduced by Takagi and Sugeno in 1985 where its major feature is the linearization of each fuzzy rule as a linear subsystem, which is utilised to simulate complicated nonlinear systems 48 . The output is a mix of all of these linear subsystems, which is accomplished by rule aggregation. The TS fuzzy model can deal with any nonlinear system with high precision and has been accepted as a universal approximator of any smooth nonlinear system 49,50 . TS rules use functions of input variables as the rule output (consequent). The general form of TS rule model having two inputs x 1 and x 2 , and output U is as follows: where z = f(x 1 , x 2 ) is a crisp function of the output; A 1 and A 2 are linguistic terms. Figure 4 depicts a typical TS inference mechanism for two input variables.
This function is most typically linear, with fuzzy rules created linearly from input-output data, although nonlinear functions are used by adaptive approaches 51 .
The aforementioned section discusses the presence of four variables for the Index Sum (IS) model which are C, TPD, IO, and D. The fuzzy IF-THEN rules of this model can be defined as follows: However, the four variables for the Leak Impact Factor (LIF) model which are PH, DF, LV, and RE. The fuzzy IF-THEN rules of this model can be defined as follows:

Research methodology
The risk assessment of pipelines could be qualitatively modelled using the expert's knowledge of the system, which is accomplished through mathematical modelling from the expert's knowledge, which includes the system's input and output data. The fuzzy clustering method is a powerful identification tool for such systems that contain potential uncertainty by grouping the input-output data into fuzzy clusters and then translating these clusters into fuzzy IF-THEN rules. This prevents identifying all of the rules as performed in conventional fuzzy inference methods. There are several fuzzy clustering methods, the most common of which is fuzzy C-means (FCM) clustering 52,53 , mountain clustering 54 , and subtractive clustering 55,56 .
Subtractive clustering method is used to conduct this research. This method, like the mountain clustering method, can auto-generate the number and initial location of cluster centres using search techniques, whereas fuzzy C-means clustering requires prior knowledge of the number of clusters. Another advantage of subtractive clustering over mountain clustering is that each data point is treated as a potential cluster centre, whereas mountain clustering treats each grid point as a potential cluster centre 57 .
Professional suggestions from previous studies [33][34][35][36][37][38][39][40][41][42][43][44][45] are used to obtain data for the (IS) and (LIF) models. TPD, C, D, and IO are the input parameters of (IS). While the (LIF) input's parameters are PH, LV, DF, and RE. The two models presented in this paper use a set of statical data that consists of 625 input/output data points, a portion of which is shown in Table 1 for the (IS) model and Table 2 for the (LIF) model.

Performance evaluation indices
Two different indices, including root mean square error (RMSE) and correlation coefficient (R 2 ), are used to compare the outputs estimated by the established model with the expert's data output to evaluate the performance of each model. The following equations are used to compute these indices: where P i is the predicted values, A i is the qualitative expert's values, A i is the average of the observed set, and N is the number of data set.
The RMSE index, which is one of the most commonly used indices in performance evaluations, could clarify the difference between the model output and the actual value. The root mean square error (RMSE) is a nonnegative number that can be zero when the predicted output exactly matches the recorded output and has no upper bound. R 2 is a positive number that indicates how much of the variability in dependent variable can be explained by independent variable(s) and how well the model fits the data. R 2 can take values between 0 and 1; which 1 indicates the model can acquire all the variability of the output variable, while 0 expresses that there is a poor correlation between model output and actual output.
As shown in Figs. 5 and 6, for each model's four inputs in the index sum and the leak impact factor model, there is a single output reflecting the risk determined by expert knowledge. As shown in Tables 1 and 2, out of 625 pipeline data for index sum and leak impact factor, 500 pipeline data are used for training, i.e., to form the membership functions and produce the fuzzy IF-THEN rules; 125 pipeline data are used for testing and checking the fuzzy model established in each model to validate the model and prevent overfitting that may occur on the training data set.
MATLAB software is used to perform subtractive clustering on the pipeline index sum and leak impact factor data. In each model, the algorithm is repeated for cluster radii 0.1 through 0.9. The best fitted model in the index sum and leak impact factor models based on the best performance indices with the testing dataset has a cluster radius of 0.8 and 0.6, respectively, after applying the subtractive clustering method to the training data If (PH is ..), AND (LV is ..), AND (RE is ..), AND (DF is ..) The interdependence of input and output parameters derived from subtractive clustering rules can be demonstrated using control surfaces, as shown in Figs. 9 and 10 for index sum and leak impact factor, respectively. The index sum model is shown in Fig. 7. Figure 7(a1) indicates the interdependence of index sum on design and corrosion, Fig. 7(b1) shows the interdependence of index sum on incorrect operations and corrosion, and Fig. 7(c1) depicts the interdependence of index sum on third party damage and corrosion. While Fig. 8 of the leak impact factor model represents the interdependence of the leak impact factor on the dispersion factor and the product hazard on Fig. 8(a2) and (b2) demonstrates the interdependence of the leak impact factor on the leak volume and the product hazard, and Fig. 8(c2) depicts the interdependence of the leak impact factor on the receptors and the product hazard.

Case study
SUMED pipeline is a typical case study used here to demonstrate the proposed pipeline risk assessment approach, Fig. 9. Since, the SUMED pipeline is critical for the international energy market because it allows for the transport of exported crude oil, which is transported by very large crude carriers VLCCs from Gulf countries and passing through the Suez Canal on their way to Europe and/or the United States 58-60 . These large tankers cannot pass through the Suez Canal fully loaded because their draught exceeds the Canal's depth. The loaded tankers are moored to a single point mooring system (SPM) at the Ain Sukhna terminal before passing through the Canal. The crude oil is then discharged from the tanker via the SPM piping system to the pipeline. Tankers can then pass through the Canal in ballast with a low draught. Crude oil is transported through two parallel pipelines, 42 Table 1. Statical description on data set of IS model. www.nature.com/scientificreports/    www.nature.com/scientificreports/ The pipeline's wall thicknesses range from 11.13 mm to 22.22 mm depending on the design 58 . The 320 km pipeline will be divided into seven sections of varying lengths. The pipeline is sectioned by taking the following factors into account: the type of land, soil condition, atmospheric type, population density, crossing rivers and waterways, high/low lands, and the presence of Right of Way (ROW). Sections will have the following distances and characteristics, as shown in Table 5   www.nature.com/scientificreports/ The risk assessment is performed on each pipeline section separately using the traditional method and the proposed model, and the results of both methods are compared in the following section of the paper. The pipeline section with the lowest RRS value is chosen as the riskiest section, which may assist pipeline operators in beginning to manage the risk on the lowest score pipeline section. To improve the reliability and safety of the lowest RRS value section, the operator may begin with the lowest scored index., e.g., low scored design index.
, and (4). An example is presented as follows: Table 6 displays the proposed model's output relative risk score RRS results (index sum and leak impact factor). Including index sum entry values, third-party damage, corrosion, design, and incorrect operations. And the leak impact factor, product hazard, leak volume, dispersion factor, and receptors entry values. Table 6 demonstrates that Sect. 3 of the pipeline represents the lowest RRS value and ranked as the riskiest part of the pipeline as it passes through the river Nile. Section 3 is the starting point in risk management to decrease the risks on it. The risk assessor can start by enhancing the design index record of this section as it has the lowest value between the index sum indices.

Results and discussions
= 250/36 = 6.94  www.nature.com/scientificreports/ The design index record can be enhanced by doing the following: • Increase pipe safety factor.
• Increase system safety factor.
• Avoid surge potential. • Make a system hydrotest to ensure pipeline integrity.
To compare the output RRS results of the proposed model with those of the traditional method, Table 7 displays the RRS output values and ranks in both methods. Figure 10 depicts the relationship between the traditional   To investigate the relationship between the qualitative method and the proposed model further, the degree of correlation between the index sum and leak impact factor obtained by the proposed subtractive clustering fuzzy model and those obtained by the qualitative method was calculated as: where, x = qualitative output results, y = fuzzy inference output results, x = mean value of x , y = mean value of y , cov(x, y) = covariance of x and y , σ x = standard deviation of x , σ y = standard deviation of y. Figure 11 shows high correlation coefficient value ( ρ = 0.9999) for index sum, ( ρ = 0.9903) for leak impact factor, and ( ρ = 0.9821) for RRS, implies the effectiveness of using the TS fuzzy inference method based on subtractive clustering.

Conclusion
Indexing pipeline risk assessment methodology is integrated with subtractive clustering fuzzy logic to deal with the uncertainty of the real-world conditions and to avoid the difficulties of constructing many rules. The computational complexity increases with the dimensions of the system variables because the number of rules increases exponentially as the number of system variables increases.
The proposed approach for pipeline risk assessment is demonstrated using a case study of a petroleum pipeline, with the results of the proposed model compared to the qualitative method. The pipeline is divided to seven sections and the risk assessment procedure is done for each section by both qualitative and proposed model. Results showed that the RRS values computed using the proposed model are consistent with those obtained using the qualitative method. The proposed model also had a high correlation and accuracy. The proposed model is evaluated using training RMSE, testing RMSE, and R 2 of values 5.9653 × 10 -8 and 7.35411 × 10 -8 and 1 for index sum model, and 1.4065 and 8.7814 and 0.9601 for the leak impact factor model respectively. The proposed model is proven to be an efficient model for pipeline risk assessment using a fuzzy clustering approach. Hence, future work will be performed for risk assessment of several facilities within the offshore industry.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. Figure 11. Correlation coefficient degree between qualitative and subtractive clustering method.